training language gan
Training Language GANs from Scratch
Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch --- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs. The resulting model, ScratchGAN, performs comparably to maximum likelihood training on EMNLP2017 News and WikiText-103 corpora according to quality and diversity metrics.
Reviews: Training Language GANs from Scratch
I've raised my score accordingly, but I still think that there needs to be more solid results. In particular, while the rebuttal notes that ScratchGAN can almost match the MLE baseline, I am not sure how strong the MLE baseline itself is. Based on sample quality, I suspect that the MLE baseline itself is quite weak and does not use more modern LM approaches (e.g. Of course, I am not saying that the authors deliberately used weak baselines, but it would be helpful to compare against stronger MLE baselines too. Weaknesses: - The main weakness is empirical---scratchGAN appreciably underperforms an MLE model in terms of LM score and reverse LM score.
Reviews: Training Language GANs from Scratch
This paper has required quite a bit of discussion between the reviewers. The concerns were that each individual technique proposed in the paper has been tried in the past. However, their combination enabled something which has not been shown before: training a decent text GAN model without MLE pre-training. While the submission does not provide a convincing argument for switching from MLE to GAN in text generation, it is still an important paper. While some may question where the text GAN direction will ever deliver state-of-the-art language generation models, it is an active area.
Training Language GANs from Scratch
Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch --- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs.
Training Language GANs from Scratch
d', Autume, Cyprien de Masson, Mohamed, Shakir, Rosca, Mihaela, Rae, Jack
Generative Adversarial Networks (GANs) enjoy great success at image generation, but have proven difficult to train in the domain of natural language. Challenges with gradient estimation, optimization instability, and mode collapse have lead practitioners to resort to maximum likelihood pre-training, followed by small amounts of adversarial fine-tuning. The benefits of GAN fine-tuning for language generation are unclear, as the resulting models produce comparable or worse samples than traditional language models. We show it is in fact possible to train a language GAN from scratch --- without maximum likelihood pre-training. We combine existing techniques such as large batch sizes, dense rewards and discriminator regularization to stabilize and improve language GANs.